Identification of protein coding regions in the human genome by quadratic discriminant analysis.
نویسنده
چکیده
A new method for predicting internal coding exons in genomic DNA sequences has been developed. This method is based on a prediction algorithm that uses the quadratic discriminant function for multivariate statistical pattern recognition. Substantial improvements have been made (with only 9 discriminant variables) when compared with existing methods: HEXON [Solovyev, V. V., Salamov, A. A. & Lawrence, C. B. (1994) Nucleic Acids Res. 22, 5156-5163] (based on linear discriminant analysis) and GRAIL2 [Uberbacher, E. C. & Mural, R. J. (1991) Proc. Natl. Acad. Sci. USA 88, 11261-11265] (based on neural networks). A computer program called MZEF is freely available to the genome community and allows users to adjust prior probability and to output alternative overlapping exons.
منابع مشابه
Long non-coding RNAs and their significance in human diseases
Protein-coding genes account for only a small fraction of the human genome and most of the genomic sequences are transcriptionally silent, but recent observations indicate significant functional elements, including non-coding protein transcripts in the human genome. Long non-coding RNAs (lncRNAs) have been defined as transcripts of >200 nucleotides without protein-coding capacity that perform t...
متن کاملPhylogenetic Analysis of Three Long Non-coding RNA Genes: AK082072, AK043754 and AK082467
Now, it is clear that protein is just one of the most functional products produced by the eukaryotic genome. Indeed, a major part of the human genome is transcribed to non-coding sequences than to the coding sequence of the protein. In this study, we selected three long non-coding RNAs namely AK082072, AK043754 and AK082467 which show brain expression and local region conservation among vertebr...
متن کاملThe Validation of the Thermal Regions in Iran with an Emphasis on the Identification of the Climatic Cycles
Background: The present study aimed to validate the thermal regions in Iran with an emphasis on the identification of the climatic cycles during the recent half-century. Methods: Data on daily temperature were extracted for 383 synoptic stations of Iran Meteorological Organization. For the zoning of the temperatures of Iran, multivariate statistical techniques (cluster analysis and discriminan...
متن کاملP-85: How a Frame Shift Caused by a Single Base Deletion In SEPT12 Gene Shed Lights As a Polymorphism
Background: Septins are members of highly conserved polymerizing GTP binding proteins well described in the animal kingdom. 14 Septin proteins have been characterized in humans (SEPT1-SEPT14), some of which are tissue-specific. All of 14 genome-mapped human septins contain a highly conserved central GTP-binding domain which is very critical in GTPase signaling properties as well as oligomerizat...
متن کاملThe Gene-Finder Computer Tools for Analysis of Human and Model Organisms Genome Sequences
We present a complex of new programs for promoter, 3'-processing, splice sites, coding exons and gene structure identification in genomic DNA of several model species. The human gene structure prediction program FGENEH, exon prediction-FEXH and splice site prediction-HSPL have been modified for sequence analysis of Drosophila (FGENED, FEXD and DSPL), C.elegance (FGENEN, FEXN and NSPL), Yeast (F...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Proceedings of the National Academy of Sciences of the United States of America
دوره 94 2 شماره
صفحات -
تاریخ انتشار 1997